Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

A New Pivoting and Iterative Text Detection Algorithm for Biomedical Images

Identifieur interne : 000611 ( Main/Exploration ); précédent : 000610; suivant : 000612

A New Pivoting and Iterative Text Detection Algorithm for Biomedical Images

Auteurs : Songhua Xu [États-Unis] ; Michael Krauthammer [États-Unis]

Source :

RBID : PMC:3265968

English descriptors

Abstract

There is interest to expand the reach of literature mining to include the analysis of biomedical images, which often contain a paper’s key findings. Examples include recent studies that use Optical Character Recognition (OCR) to extract image text, which is used to boost biomedical image retrieval and classification. Such studies rely on the robust identification of text elements in biomedical images, which is a non-trivial task. In this work, we introduce a new text detection algorithm for biomedical images based on iterative projection histograms. We study the effectiveness of our algorithm by evaluating the performance on a set of manually labeled random biomedical images, and compare the performance against other state-of-the-art text detection algorithms. In this paper, we demonstrate that a projection histogram-based text detection approach is well suited for text detection in biomedical images, with a performance of F score of .60. The approach performs better than comparable approaches for text detection. Further, we show that the iterative application of the algorithm is boosting overall detection performance. A C++ implementation of our algorithm is freely available through email request for academic use.


Url:
DOI: 10.1016/j.jbi.2010.09.006
PubMed: 20887803
PubMed Central: 3265968


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">A New Pivoting and Iterative Text Detection Algorithm for Biomedical Images</title>
<author>
<name sortKey="Xu, Songhua" sort="Xu, Songhua" uniqKey="Xu S" first="Songhua" last="Xu">Songhua Xu</name>
<affiliation wicri:level="2">
<nlm:aff id="A1">Department of Pathology & Yale Center for Medical Informatics, 300 Cedar Street, New Haven, CT 06510</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Connecticut</region>
</placeName>
<wicri:cityArea>Department of Pathology & Yale Center for Medical Informatics, 300 Cedar Street, New Haven</wicri:cityArea>
</affiliation>
<affiliation wicri:level="2">
<nlm:aff id="A2">Oak Ridge National Laboratory, One Bethel Valley Road, Oak Ridge, TN 37831, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Oak Ridge National Laboratory, One Bethel Valley Road, Oak Ridge, TN 37831</wicri:regionArea>
<placeName>
<region type="state">Tennessee</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Krauthammer, Michael" sort="Krauthammer, Michael" uniqKey="Krauthammer M" first="Michael" last="Krauthammer">Michael Krauthammer</name>
<affiliation wicri:level="2">
<nlm:aff id="A1">Department of Pathology & Yale Center for Medical Informatics, 300 Cedar Street, New Haven, CT 06510</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Connecticut</region>
</placeName>
<wicri:cityArea>Department of Pathology & Yale Center for Medical Informatics, 300 Cedar Street, New Haven</wicri:cityArea>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PMC</idno>
<idno type="pmid">20887803</idno>
<idno type="pmc">3265968</idno>
<idno type="url">http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3265968</idno>
<idno type="RBID">PMC:3265968</idno>
<idno type="doi">10.1016/j.jbi.2010.09.006</idno>
<date when="2010">2010</date>
<idno type="wicri:Area/Pmc/Corpus">000087</idno>
<idno type="wicri:Area/Pmc/Curation">000087</idno>
<idno type="wicri:Area/Pmc/Checkpoint">000153</idno>
<idno type="wicri:source">PubMed</idno>
<idno type="wicri:Area/PubMed/Corpus">000039</idno>
<idno type="wicri:Area/PubMed/Curation">000039</idno>
<idno type="wicri:Area/PubMed/Checkpoint">000039</idno>
<idno type="wicri:Area/Ncbi/Merge">000087</idno>
<idno type="wicri:Area/Ncbi/Curation">000087</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">000087</idno>
<idno type="wicri:doubleKey">1532-0464:2010:Xu S:a:new:pivoting</idno>
<idno type="wicri:Area/Main/Merge">000616</idno>
<idno type="wicri:Area/Main/Curation">000611</idno>
<idno type="wicri:Area/Main/Exploration">000611</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a" type="main">A New Pivoting and Iterative Text Detection Algorithm for Biomedical Images</title>
<author>
<name sortKey="Xu, Songhua" sort="Xu, Songhua" uniqKey="Xu S" first="Songhua" last="Xu">Songhua Xu</name>
<affiliation wicri:level="2">
<nlm:aff id="A1">Department of Pathology & Yale Center for Medical Informatics, 300 Cedar Street, New Haven, CT 06510</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Connecticut</region>
</placeName>
<wicri:cityArea>Department of Pathology & Yale Center for Medical Informatics, 300 Cedar Street, New Haven</wicri:cityArea>
</affiliation>
<affiliation wicri:level="2">
<nlm:aff id="A2">Oak Ridge National Laboratory, One Bethel Valley Road, Oak Ridge, TN 37831, USA</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<wicri:regionArea>Oak Ridge National Laboratory, One Bethel Valley Road, Oak Ridge, TN 37831</wicri:regionArea>
<placeName>
<region type="state">Tennessee</region>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Krauthammer, Michael" sort="Krauthammer, Michael" uniqKey="Krauthammer M" first="Michael" last="Krauthammer">Michael Krauthammer</name>
<affiliation wicri:level="2">
<nlm:aff id="A1">Department of Pathology & Yale Center for Medical Informatics, 300 Cedar Street, New Haven, CT 06510</nlm:aff>
<country xml:lang="fr">États-Unis</country>
<placeName>
<region type="state">Connecticut</region>
</placeName>
<wicri:cityArea>Department of Pathology & Yale Center for Medical Informatics, 300 Cedar Street, New Haven</wicri:cityArea>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Journal of Biomedical Informatics</title>
<idno type="ISSN">1532-0464</idno>
<idno type="eISSN">1532-0480</idno>
<imprint>
<date when="2010">2010</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Algorithms</term>
<term>Image Interpretation, Computer-Assisted (methods)</term>
<term>Information Storage and Retrieval (methods)</term>
<term>Pattern Recognition, Automated (methods)</term>
</keywords>
<keywords scheme="MESH" qualifier="methods" xml:lang="en">
<term>Image Interpretation, Computer-Assisted</term>
<term>Information Storage and Retrieval</term>
<term>Pattern Recognition, Automated</term>
</keywords>
<keywords scheme="MESH" xml:lang="en">
<term>Algorithms</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">
<p id="P2">There is interest to expand the reach of literature mining to include the analysis of biomedical images, which often contain a paper’s key findings. Examples include recent studies that use Optical Character Recognition (OCR) to extract image text, which is used to boost biomedical image retrieval and classification. Such studies rely on the robust identification of text elements in biomedical images, which is a non-trivial task. In this work, we introduce a new text detection algorithm for biomedical images based on iterative projection histograms. We study the effectiveness of our algorithm by evaluating the performance on a set of manually labeled random biomedical images, and compare the performance against other state-of-the-art text detection algorithms. In this paper, we demonstrate that a projection histogram-based text detection approach is well suited for text detection in biomedical images, with a performance of F score of .60. The approach performs better than comparable approaches for text detection. Further, we show that the iterative application of the algorithm is boosting overall detection performance. A C++ implementation of our algorithm is freely available through email request for academic use.</p>
</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>États-Unis</li>
</country>
<region>
<li>Connecticut</li>
<li>Tennessee</li>
</region>
</list>
<tree>
<country name="États-Unis">
<region name="Connecticut">
<name sortKey="Xu, Songhua" sort="Xu, Songhua" uniqKey="Xu S" first="Songhua" last="Xu">Songhua Xu</name>
</region>
<name sortKey="Krauthammer, Michael" sort="Krauthammer, Michael" uniqKey="Krauthammer M" first="Michael" last="Krauthammer">Michael Krauthammer</name>
<name sortKey="Xu, Songhua" sort="Xu, Songhua" uniqKey="Xu S" first="Songhua" last="Xu">Songhua Xu</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000611 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000611 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     PMC:3265968
   |texte=   A New Pivoting and Iterative Text Detection Algorithm for Biomedical Images
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/Main/Exploration/RBID.i   -Sk "pubmed:20887803" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd   \
       | NlmPubMed2Wicri -a OcrV1 

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024